library(tidyverse)
library(lubridate)
Here is our first task:
The project goal is to identify patients seen for drug overdose, determine if they had an active opioid at the start of the encounter, and if they had any readmissions for drug overdose.
Your task is to assemble the study cohort by identifying encounters that meet the following criteria:
Sounds great. Let’s start by taking a look at the data.
allergies <- read_csv("datasets/allergies.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m
)
allergies
encounters <- read_csv("datasets/encounters.csv")
Parsed with column specification:
cols(
Id = [31mcol_character()[39m,
START = [34mcol_datetime(format = "")[39m,
STOP = [34mcol_datetime(format = "")[39m,
PATIENT = [31mcol_character()[39m,
PROVIDER = [31mcol_character()[39m,
ENCOUNTERCLASS = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
encounters
medications <- read_csv("datasets/medications.csv")
Parsed with column specification:
cols(
START = [34mcol_date(format = "")[39m,
STOP = [34mcol_date(format = "")[39m,
PATIENT = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE = [32mcol_double()[39m,
DESCRIPTION = [31mcol_character()[39m,
COST = [32mcol_double()[39m,
DISPENSES = [32mcol_double()[39m,
TOTALCOST = [32mcol_double()[39m,
REASONCODE = [32mcol_double()[39m,
REASONDESCRIPTION = [31mcol_character()[39m
)
medications
patients <- read_csv("datasets/patients.csv")
Parsed with column specification:
cols(
.default = col_character(),
BIRTHDATE = [34mcol_date(format = "")[39m,
DEATHDATE = [34mcol_date(format = "")[39m,
ZIP = [32mcol_double()[39m
)
See spec(...) for full column specifications.
patients
procedures <- read_csv("datasets/procedures.csv")
Parsed with column specification:
cols(
DATE = [34mcol_date(format = "")[39m,
PATIENT.x = [31mcol_character()[39m,
ENCOUNTER = [31mcol_character()[39m,
CODE.x = [31mcol_character()[39m,
DESCRIPTION.x = [31mcol_character()[39m,
COST.x = [32mcol_double()[39m,
REASONCODE.x = [32mcol_double()[39m,
REASONDESCRIPTION.x = [31mcol_character()[39m
)
procedures
Ok, we are chiefly interested in the encounters table, and basically want to filter it based on the specifications given in the task. Let’s start by filtering the encounters by drug overdose. Looking at the data dictionary sheet for the encounters table, we can see that the REASONCODE column are SNOMED-CT codes.
We can lookup the code for a drug overdose here: https://browser.ihtsdotools.org/, which has the code as 55680006.
drug_overdoses <- filter(encounters, REASONCODE == 55680006)
drug_overdoses
Great, now we just need to filter for encounters that occur after July 15, 1999.
The encounters table has two column that represent the date of the encounter. START and STOP, further clarification would be neccessary to determine if the task is to find encounters that begin after 07/15/1999 or end at that date. For the purposes of this exercise, we’ll go with encounters that begin after that date due to the term occur in the specification.
after_date <- filter(drug_overdoses, START > "1999-07-15")
arrange(after_date, START)
Now we’re concerned with encounters with patients between the ages of 18 and 35; we’ll need to join the patients table to handle that.
with_patients <- inner_join(after_date, patients, c("PATIENT" = "Id"))
with_patients
Based upon the wording in the specifications, the patient’s age must be greater than or equal to 18 at the start of an encounter and less than or equal to 35 at the end of the encounter.
Let’s make sure that there are no encounters in our table that has not ended, because a patient could age to 36 by the time the encounter is over.
not_ended <- drop_na(with_patients, STOP)
not_ended
Turns out we’re ok. Let’s do the filtering now. First we’ll need to calculate the age of the patient at the start and end of the encounter.
not_ended$AGEATSTART <- as.period(interval(not_ended$BIRTHDATE, not_ended$START))$year
not_ended$AGEATSTOP <- as.period(interval(not_ended$BIRTHDATE, not_ended$STOP))$year
select(not_ended, Id, AGEATSTART, AGEATSTOP)
aged <- filter(not_ended, AGEATSTART >= 18 & AGEATSTOP <= 35)
aged
That finishes up the first task.